import pandas as pd
data = {
'Category': ['A', 'B', 'A', 'B', 'A'],
'Value': [10,15,20,25,30]
}
df = pd.DataFrame(data)Group By Operations in Pandas
GroupBy operations are one of the most powerful features in Pandas for data analysis. They allow you to:
- Split data into groups based on criteria
- Apply functions to each group independently
- Combine the results back into a DataFrame
This notebook covers essential groupby techniques including aggregation functions, multiple aggregations, and advanced operations.
1. Setting Up Sample Data
Letβs create a sample dataset to demonstrate groupby operations. Weβll work with categorical data and numerical values.
2. Basic Aggregation Functions
Groupby operations allow you to calculate summary statistics for each group. Here are the most common aggregation functions:
Sum Aggregation
Calculate the total sum of values for each category:
df.groupby('Category').sum()| Value | |
|---|---|
| Category | |
| A | 60 |
| B | 40 |
Mean Aggregation
Calculate the average value for each category:
df.groupby('Category').mean()| Value | |
|---|---|
| Category | |
| A | 20.0 |
| B | 20.0 |
Median Aggregation
Calculate the median (middle) value for each category:
df.groupby('Category').median()| Value | |
|---|---|
| Category | |
| A | 20.0 |
| B | 20.0 |
Maximum Values
Find the highest value in each category:
df.groupby('Category').max()| Value | |
|---|---|
| Category | |
| A | 30 |
| B | 25 |
Minimum Values
Find the lowest value in each category:
df.groupby('Category').min()| Value | |
|---|---|
| Category | |
| A | 10 |
| B | 15 |
Standard Deviation
Measure the spread of values within each category:
df.groupby('Category').std()| Value | |
|---|---|
| Category | |
| A | 10.000000 |
| B | 7.071068 |
Variance
Calculate the variance (squared standard deviation) for each category:
df.groupby('Category').var()| Value | |
|---|---|
| Category | |
| A | 100.0 |
| B | 50.0 |
3. Multiple Aggregations
You can apply multiple aggregation functions at once using the agg() method. This provides a comprehensive view of your grouped data.
Applying Multiple Functions
Calculate sum, mean, and maximum for each category in one operation:
df.groupby('Category').agg(['sum', 'mean', 'max'])| Value | |||
|---|---|---|---|
| sum | mean | max | |
| Category | |||
| A | 60 | 20.0 | 30 |
| B | 40 | 20.0 | 25 |
Summary
GroupBy operations are essential for data analysis in Pandas. In this notebook, you learned:
π’ Basic Aggregation Functions
sum(): Total values per groupmean(): Average values per group
median(): Middle value per groupmax()/min(): Highest/lowest values per groupstd()/var(): Measure spread within groups
π Advanced Operations
agg(): Apply multiple functions simultaneously- Combine statistics for comprehensive group analysis
π‘ Key Concepts
- Split-Apply-Combine: The three-step process of groupby operations
- Aggregation: Reducing groups to single values (sum, mean, etc.)
- Multiple Functions: Use
agg()for comprehensive summaries
π Best Practices
- Choose appropriate aggregation functions for your data type
- Use multiple aggregations to get complete group insights
- Consider data distribution when selecting measures (mean vs median)
π Next Steps
- Explore groupby with multiple columns
- Learn filtering and transformation operations
- Practice with real datasets for business insights
Mastering groupby operations will significantly enhance your data analysis capabilities! π―